Overview

Dataset statistics

Number of variables10
Number of observations296760
Missing cells254500
Missing cells (%)8.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory42.8 MiB
Average record size in memory151.4 B

Variable types

NUM8
CAT1
DATE1

Reproduction

Analysis started2020-04-13 00:11:08.727311
Analysis finished2020-04-13 00:23:10.726193
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
$SO_2$ (µg/m3) has 43871 (14.8%) missing values Missing
CO (ppm) has 40287 (13.6%) missing values Missing
$O_3$ (µg/m3) has 42226 (14.2%) missing values Missing
$PM_{10}$ (µg/m3) has 46542 (15.7%) missing values Missing
$NO_2$ (µg/m3) has 40747 (13.7%) missing values Missing
NO (µg/m3) has 40827 (13.8%) missing values Missing
$SO_2$ (µg/m3) is highly skewed (γ1 = 34.17882096) Skewed
$SO_2$ (µg/m3) has 78999 (26.6%) zeros Zeros
CO (ppm) has 11044 (3.7%) zeros Zeros

Variables

Distinct count43824
Unique (%)14.8%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
Minimum2011-01-01 00:00:00
Maximum2015-12-31 23:00:00
Histogram

$SO_2$ (µg/m3)
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS
Distinct count15119
Unique (%)6.0%
Missing43871
Missing (%)14.8%
Infinite0
Infinite (%)0.0%
Mean1.1579326887953874
Minimum0.0
Maximum273.708955847191
Zeros78999
Zeros (%)26.6%
Memory size2.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.5228240342
Q31.563885299
95-th percentile4.135172947
Maximum273.7089558
Range273.7089558
Interquartile range (IQR)1.563885299

Descriptive statistics

Standard deviation2.296314677
Coefficient of variation (CV)1.983115858
Kurtosis2950.186933
Mean1.157932689
Median Absolute Deviation (MAD)1.13210345
Skewness34.17882096
Sum292828.4397
Variance5.273061098
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 78999 26.6%
 
0.5208641808 2363 0.8%
 
0.7812962712 2261 0.8%
 
0.2604320904 2231 0.8%
 
0.2606475499 2033 0.7%
 
0.2611202188 1992 0.7%
 
1.041728362 1940 0.7%
 
0.2600073888 1921 0.6%
 
0.2611377503 1883 0.6%
 
0.5200147776 1801 0.6%
 
Other values (15109) 155465 52.4%
 
(Missing) 43871 14.8%
 
ValueCountFrequency (%) 
0 78999 26.6%
 
0.2543025298 1 < 0.1%
 
0.2544061069 1 < 0.1%
 
0.2545720059 1 < 0.1%
 
0.2545927585 1 < 0.1%
 
ValueCountFrequency (%) 
273.7089558 1 < 0.1%
 
246.4349729 1 < 0.1%
 
233.8691658 1 < 0.1%
 
210.3116422 1 < 0.1%
 
209.9314216 1 < 0.1%
 

CO (ppm)
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count445
Unique (%)0.2%
Missing40287
Missing (%)13.6%
Infinite0
Infinite (%)0.0%
Mean0.3798388134423507
Minimum0.0
Maximum7.66
Zeros11044
Zeros (%)3.7%
Memory size2.3 MiB

Quantile statistics

Minimum0
5-th percentile0.01
Q10.17
median0.33
Q30.53
95-th percentile0.9
Maximum7.66
Range7.66
Interquartile range (IQR)0.36

Descriptive statistics

Standard deviation0.3133495106
Coefficient of variation (CV)0.8249539002
Kurtosis23.10958797
Mean0.3798388134
Median Absolute Deviation (MAD)0.2248706058
Skewness2.804800954
Sum97418.4
Variance0.09818791578
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 11044 3.7%
 
0.01 4667 1.6%
 
0.2 4051 1.4%
 
0.21 4031 1.4%
 
0.26 4025 1.4%
 
0.24 4008 1.4%
 
0.23 4007 1.4%
 
0.17 3979 1.3%
 
0.28 3971 1.3%
 
0.25 3967 1.3%
 
Other values (435) 208723 70.3%
 
(Missing) 40287 13.6%
 
ValueCountFrequency (%) 
0 11044 3.7%
 
0.01 4667 1.6%
 
0.02 3475 1.2%
 
0.03 3037 1.0%
 
0.04 2780 0.9%
 
ValueCountFrequency (%) 
7.66 1 < 0.1%
 
6.26 1 < 0.1%
 
6.09 1 < 0.1%
 
6.05 1 < 0.1%
 
5.99 1 < 0.1%
 

$O_3$ (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count63252
Unique (%)24.9%
Missing42226
Missing (%)14.2%
Infinite0
Infinite (%)0.0%
Mean14.592120864219352
Minimum0.0
Maximum151.80936261525568
Zeros1117
Zeros (%)0.4%
Memory size2.3 MiB

Quantile statistics

Minimum0
5-th percentile2.175734911
Q17.307514244
median12.87810793
Q319.95511802
95-th percentile32.99522494
Maximum151.8093626
Range151.8093626
Interquartile range (IQR)12.64760378

Descriptive statistics

Standard deviation9.640177003
Coefficient of variation (CV)0.660642623
Kurtosis1.709970747
Mean14.59212086
Median Absolute Deviation (MAD)7.575743751
Skewness1.058608995
Sum3714190.892
Variance92.93301266
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1117 0.4%
 
19.36820278 265 0.1%
 
18.97692596 248 0.1%
 
17.41181867 242 0.1%
 
17.21618025 237 0.1%
 
20.93331008 234 0.1%
 
18.58564914 233 0.1%
 
18.39001072 230 0.1%
 
19.75947961 229 0.1%
 
18.78128755 228 0.1%
 
Other values (63242) 251271 84.7%
 
(Missing) 42226 14.2%
 
ValueCountFrequency (%) 
0 1117 0.4%
 
0.1915285258 1 < 0.1%
 
0.1918425842 1 < 0.1%
 
0.1920630387 1 < 0.1%
 
0.1921892402 1 < 0.1%
 
ValueCountFrequency (%) 
151.8093626 1 < 0.1%
 
108.2943748 1 < 0.1%
 
103.7071195 1 < 0.1%
 
100.9446188 1 < 0.1%
 
98.73216077 1 < 0.1%
 

$PM_{10}$ (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count1665
Unique (%)0.7%
Missing46542
Missing (%)15.7%
Infinite0
Infinite (%)0.0%
Mean25.336687608405462
Minimum0.0
Maximum969.4
Zeros19
Zeros (%)< 0.1%
Memory size2.3 MiB

Quantile statistics

Minimum0
5-th percentile5.7
Q114
median22.2
Q332.4
95-th percentile54.6
Maximum969.4
Range969.4
Interquartile range (IQR)18.4

Descriptive statistics

Standard deviation17.90939019
Coefficient of variation (CV)0.7068560209
Kurtosis126.2346366
Mean25.33668761
Median Absolute Deviation (MAD)12.09399281
Skewness5.306224199
Sum6339695.3
Variance320.7462568
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19.4 818 0.3%
 
18.6 808 0.3%
 
18 807 0.3%
 
19.1 801 0.3%
 
20.8 799 0.3%
 
17.3 797 0.3%
 
20 796 0.3%
 
22.7 795 0.3%
 
16.3 794 0.3%
 
15.6 792 0.3%
 
Other values (1655) 242211 81.6%
 
(Missing) 46542 15.7%
 
ValueCountFrequency (%) 
0 19 < 0.1%
 
0.1 42 < 0.1%
 
0.2 34 < 0.1%
 
0.3 33 < 0.1%
 
0.4 35 < 0.1%
 
ValueCountFrequency (%) 
969.4 1 < 0.1%
 
878.4 1 < 0.1%
 
832.9 1 < 0.1%
 
656.8 1 < 0.1%
 
598 1 < 0.1%
 

$NO_2$ (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count111995
Unique (%)43.7%
Missing40747
Missing (%)13.7%
Infinite0
Infinite (%)0.0%
Mean21.152376988969056
Minimum0.0
Maximum268.6440491107547
Zeros217
Zeros (%)0.1%
Memory size2.3 MiB

Quantile statistics

Minimum0
5-th percentile4.929146911
Q112.58186922
median19.41337046
Q327.73208362
95-th percentile42.75497444
Maximum268.6440491
Range268.6440491
Interquartile range (IQR)15.1502144

Descriptive statistics

Standard deviation12.08765172
Coefficient of variation (CV)0.5714559514
Kurtosis4.257981515
Mean21.15237699
Median Absolute Deviation (MAD)9.28864023
Skewness1.234783773
Sum5415283.49
Variance146.111324
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 217 0.1%
 
16.31439814 149 0.1%
 
17.06448541 141 < 0.1%
 
19.46565993 139 < 0.1%
 
17.62705086 134 < 0.1%
 
16.12687632 133 < 0.1%
 
17.43952905 131 < 0.1%
 
15.37678905 130 < 0.1%
 
13.68909269 129 < 0.1%
 
20.25235631 127 < 0.1%
 
Other values (111985) 254583 85.8%
 
(Missing) 40747 13.7%
 
ValueCountFrequency (%) 
0 217 0.1%
 
0.01875092284 46 < 0.1%
 
0.03750184567 38 < 0.1%
 
0.05625276851 14 < 0.1%
 
0.07500369135 21 < 0.1%
 
ValueCountFrequency (%) 
268.6440491 1 < 0.1%
 
204.2906119 1 < 0.1%
 
165.6963269 1 < 0.1%
 
157.4632923 1 < 0.1%
 
153.2053251 1 < 0.1%
 

NO (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count154118
Unique (%)60.2%
Missing40827
Missing (%)13.8%
Infinite0
Infinite (%)0.0%
Mean37.58717839929263
Minimum0.0
Maximum684.9544090228212
Zeros1078
Zeros (%)0.4%
Memory size2.3 MiB

Quantile statistics

Minimum0
5-th percentile1.469860324
Q19.339641486
median23.8082045
Q347.04165279
95-th percentile130.9588906
Maximum684.954409
Range684.954409
Interquartile range (IQR)37.7020113

Descriptive statistics

Standard deviation46.0990583
Coefficient of variation (CV)1.226457006
Kurtosis12.06897714
Mean37.5871784
Median Absolute Deviation (MAD)29.7755884
Skewness2.967953558
Sum9619799.329
Variance2125.123176
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1078 0.4%
 
0.9767468513 182 0.1%
 
0.8546534949 174 0.1%
 
0.6104667821 172 0.1%
 
1.220933564 162 0.1%
 
1.465120277 161 0.1%
 
0.4883734257 159 0.1%
 
1.587213633 157 0.1%
 
0.7325601385 154 0.1%
 
1.343026921 153 0.1%
 
Other values (154108) 253381 85.4%
 
(Missing) 40827 13.8%
 
ValueCountFrequency (%) 
0 1078 0.4%
 
0.01223147655 5 < 0.1%
 
0.01231627782 1 < 0.1%
 
0.01235371964 1 < 0.1%
 
0.01238299881 1 < 0.1%
 
ValueCountFrequency (%) 
684.954409 1 < 0.1%
 
647.7467967 1 < 0.1%
 
560.8769782 1 < 0.1%
 
545.6961459 1 < 0.1%
 
545.658232 1 < 0.1%
 

station
Categorical

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.3 MiB
PARALELA-CAB
43824
DIQUE DO TORORÓ
43680
RIO VERMELHO
43680
CAMPO GRANDE
43656
PIRAJÁ
43608
Other values (3)
78312
ValueCountFrequency (%) 
PARALELA-CAB 43824 14.8%
 
DIQUE DO TORORÓ 43680 14.7%
 
RIO VERMELHO 43680 14.7%
 
CAMPO GRANDE 43656 14.7%
 
PIRAJÁ 43608 14.7%
 
AV ACM - DETRAN 26256 8.8%
 
ITAIGARA 26040 8.8%
 
AV BARROS REIS 26016 8.8%
 

Length

Max length15
Mean length11.64965629
Min length6
ValueCountFrequency (%) 
Uppercase_Letter 22 91.7%
 
Dash_Punctuation 1 4.2%
 
Space_Separator 1 4.2%
 
ValueCountFrequency (%) 
Latin 22 91.7%
 
Common 2 8.3%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

lat
Real number (ℝ)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-12.969701134695887
Minimum-13.005500404304225
Maximum-12.898903466026768
Zeros0
Zeros (%)0.0%
Memory size2.3 MiB

Quantile statistics

Minimum-13.0055004
5-th percentile-13.0055004
Q1-12.98973907
median-12.98371943
Q3-12.95380924
95-th percentile-12.89890347
Maximum-12.89890347
Range0.1065969383
Interquartile range (IQR)0.03592983707

Descriptive statistics

Standard deviation0.03339157866
Coefficient of variation (CV)-0.002574583509
Kurtosis0.2242353465
Mean-12.96970113
Median Absolute Deviation (MAD)0.02645618704
Skewness1.153648667
Sum-3848888.509
Variance0.001114997526
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-13.0055004 -12.99233172 -12.98672925 -12.98086074 -12.97112677 -12.95903037 -12.92635635 -12.89890347], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-12.95380924 43824 14.8%
 
-12.98371943 43680 14.7%
 
-13.0055004 43680 14.7%
 
-12.98973907 43656 14.7%
 
-12.89890347 43608 14.7%
 
-12.97800204 26256 8.8%
 
-12.99492436 26040 8.8%
 
-12.9642515 26016 8.8%
 
ValueCountFrequency (%) 
-13.0055004 43680 14.7%
 
-12.99492436 26040 8.8%
 
-12.98973907 43656 14.7%
 
-12.98371943 43680 14.7%
 
-12.97800204 26256 8.8%
 
ValueCountFrequency (%) 
-12.89890347 43608 14.7%
 
-12.95380924 43824 14.8%
 
-12.9642515 26016 8.8%
 
-12.97800204 26256 8.8%
 
-12.98371943 43680 14.7%
 

lon
Real number (ℝ)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-38.478602028828334
Minimum-38.520087964258316
Maximum-38.4283765135188
Zeros0
Zeros (%)0.0%
Memory size2.3 MiB

Quantile statistics

Minimum-38.52008796
5-th percentile-38.52008796
Q1-38.50698772
median-38.4793294
Q3-38.45784983
95-th percentile-38.42837651
Maximum-38.42837651
Range0.09171145074
Interquartile range (IQR)0.04913788754

Descriptive statistics

Standard deviation0.02876902065
Coefficient of variation (CV)-0.0007476628343
Kurtosis-0.810422634
Mean-38.47860203
Median Absolute Deviation (MAD)0.02321297532
Skewness0.2580242339
Sum-11418909.94
Variance0.0008276565494
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-38.52008796 -38.49708083 -38.48325167 -38.47734722 -38.47214644 -38.46338883 -38.44311317 -38.42837651], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-38.42837651 43824 14.8%
 
-38.48717394 43680 14.7%
 
-38.50698772 43680 14.7%
 
-38.52008796 43656 14.7%
 
-38.45784983 43608 14.7%
 
-38.46892783 26256 8.8%
 
-38.47536505 26040 8.8%
 
-38.4793294 26016 8.8%
 
ValueCountFrequency (%) 
-38.52008796 43656 14.7%
 
-38.50698772 43680 14.7%
 
-38.48717394 43680 14.7%
 
-38.4793294 26016 8.8%
 
-38.47536505 26040 8.8%
 
ValueCountFrequency (%) 
-38.42837651 43824 14.8%
 
-38.45784983 43608 14.7%
 
-38.46892783 26256 8.8%
 
-38.47536505 26040 8.8%
 
-38.4793294 26016 8.8%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_index$SO_2$ (µg/m3)CO (ppm)$O_3$ (µg/m3)$PM_{10}$ (µg/m3)$NO_2$ (µg/m3)NO (µg/m3)stationlatlon
02013-01-02 00:00:002.3458360.5311.52181126.518.90421410.744251AV ACM - DETRAN-12.978002-38.468928
12013-01-02 01:00:003.9156090.5213.29935420.110.3098620.978220AV ACM - DETRAN-12.978002-38.468928
22013-01-02 02:00:004.4425170.5114.68436124.05.4420220.734463AV ACM - DETRAN-12.978002-38.468928
32013-01-02 03:00:004.7073860.5116.06700117.54.1315420.490011AV ACM - DETRAN-12.978002-38.468928
42013-01-02 04:00:004.7089640.5216.66039911.74.5086470.490175AV ACM - DETRAN-12.978002-38.468928
52013-01-02 05:00:004.9693240.5417.83194614.93.3806361.225130AV ACM - DETRAN-12.978002-38.468928
62013-01-02 06:00:005.4672290.5415.40950511.910.0954082.195123AV ACM - DETRAN-12.978002-38.468928
72013-01-02 07:00:005.4287620.6011.04002419.918.19239010.535112AV ACM - DETRAN-12.978002-38.468928
82013-01-02 08:00:005.1451230.5517.53964317.215.70242714.460568AV ACM - DETRAN-12.978002-38.468928
92013-01-02 09:00:004.8710140.4919.39995911.320.61892318.493747AV ACM - DETRAN-12.978002-38.468928

Last rows

df_index$SO_2$ (µg/m3)CO (ppm)$O_3$ (µg/m3)$PM_{10}$ (µg/m3)$NO_2$ (µg/m3)NO (µg/m3)stationlatlon
2967502015-12-31 14:00:001.3000370.0510.1298427.520.87419031.848998RIO VERMELHO-13.0055-38.487174
2967512015-12-31 15:00:001.3000370.1011.2986706.819.75393026.965079RIO VERMELHO-13.0055-38.487174
2967522015-12-31 16:00:000.7800220.1011.10386519.719.92196927.233024RIO VERMELHO-13.0055-38.487174
2967532015-12-31 17:00:001.3000370.1010.32464613.921.26628233.639362RIO VERMELHO-13.0055-38.487174
2967542015-12-31 18:00:000.7800220.1212.66230219.419.69791726.867644RIO VERMELHO-13.0055-38.487174
2967552015-12-31 19:00:000.2600070.1013.24671620.621.86375427.732378RIO VERMELHO-13.0055-38.487174
2967562015-12-31 20:00:000.5200150.0813.63632518.824.60839327.659302RIO VERMELHO-13.0055-38.487174
2967572015-12-31 21:00:000.2600070.1014.22073929.523.17072523.311274RIO VERMELHO-13.0055-38.487174
2967582015-12-31 22:00:000.5200150.2015.77917727.223.07737022.580513RIO VERMELHO-13.0055-38.487174
2967592015-12-31 23:00:000.2600070.2214.41554423.622.47989725.125997RIO VERMELHO-13.0055-38.487174